246 research outputs found
Scalable Routing Easy as PIE: a Practical Isometric Embedding Protocol (Technical Report)
We present PIE, a scalable routing scheme that achieves 100% packet delivery
and low path stretch. It is easy to implement in a distributed fashion and
works well when costs are associated to links. Scalability is achieved by using
virtual coordinates in a space of concise dimensionality, which enables greedy
routing based only on local knowledge. PIE is a general routing scheme, meaning
that it works on any graph. We focus however on the Internet, where routing
scalability is an urgent concern. We show analytically and by using simulation
that the scheme scales extremely well on Internet-like graphs. In addition, its
geometric nature allows it to react efficiently to topological changes or
failures by finding new paths in the network at no cost, yielding better
delivery ratios than standard algorithms. The proposed routing scheme needs an
amount of memory polylogarithmic in the size of the network and requires only
local communication between the nodes. Although each node constructs its
coordinates and routes packets locally, the path stretch remains extremely low,
even lower than for centralized or less scalable state-of-the-art algorithms:
PIE always finds short paths and often enough finds the shortest paths.Comment: This work has been previously published in IEEE ICNP'11. The present
document contains an additional optional mechanism, presented in Section
III-D, to further improve performance by using route asymmetry. It also
contains new simulation result
Towards Unbiased BFS Sampling
Breadth First Search (BFS) is a widely used approach for sampling large
unknown Internet topologies. Its main advantage over random walks and other
exploration techniques is that a BFS sample is a plausible graph on its own,
and therefore we can study its topological characteristics. However, it has
been empirically observed that incomplete BFS is biased toward high-degree
nodes, which may strongly affect the measurements. In this paper, we first
analytically quantify the degree bias of BFS sampling. In particular, we
calculate the node degree distribution expected to be observed by BFS as a
function of the fraction f of covered nodes, in a random graph RG(pk) with an
arbitrary degree distribution pk. We also show that, for RG(pk), all commonly
used graph traversal techniques (BFS, DFS, Forest Fire, Snowball Sampling, RDS)
suffer from exactly the same bias. Next, based on our theoretical analysis, we
propose a practical BFS-bias correction procedure. It takes as input a
collected BFS sample together with its fraction f. Even though RG(pk) does not
capture many graph properties common in real-life graphs (such as
assortativity), our RG(pk)-based correction technique performs well on a broad
range of Internet topologies and on two large BFS samples of Facebook and Orkut
networks. Finally, we consider and evaluate a family of alternative correction
procedures, and demonstrate that, although they are unbiased for an arbitrary
topology, their large variance makes them far less effective than the
RG(pk)-based technique.Comment: BFS, RDS, graph traversal, sampling bias correctio
The Entropy of Conditional Markov Trajectories
To quantify the randomness of Markov trajectories with fixed initial and
final states, Ekroot and Cover proposed a closed-form expression for the
entropy of trajectories of an irreducible finite state Markov chain. Numerous
applications, including the study of random walks on graphs, require the
computation of the entropy of Markov trajectories conditioned on a set of
intermediate states. However, the expression of Ekroot and Cover does not allow
for computing this quantity. In this paper, we propose a method to compute the
entropy of conditional Markov trajectories through a transformation of the
original Markov chain into a Markov chain that exhibits the desired conditional
distribution of trajectories. Moreover, we express the entropy of Markov
trajectories - a global quantity - as a linear combination of local entropies
associated with the Markov chain states.Comment: Accepted for publication in IEEE Transactions on Information Theor
Coordinate Descent with Bandit Sampling
Coordinate descent methods usually minimize a cost function by updating a
random decision variable (corresponding to one coordinate) at a time. Ideally,
we would update the decision variable that yields the largest decrease in the
cost function. However, finding this coordinate would require checking all of
them, which would effectively negate the improvement in computational
tractability that coordinate descent is intended to afford. To address this, we
propose a new adaptive method for selecting a coordinate. First, we find a
lower bound on the amount the cost function decreases when a coordinate is
updated. We then use a multi-armed bandit algorithm to learn which coordinates
result in the largest lower bound by interleaving this learning with
conventional coordinate descent updates except that the coordinate is selected
proportionately to the expected decrease. We show that our approach improves
the convergence of coordinate descent methods both theoretically and
experimentally.Comment: appearing at NeurIPS 201
Observer Placement for Source Localization: The Effect of Budgets and Transmission Variance
When an epidemic spreads in a network, a key question is where was its
source, i.e., the node that started the epidemic. If we know the time at which
various nodes were infected, we can attempt to use this information in order to
identify the source. However, maintaining observer nodes that can provide their
infection time may be costly, and we may have a budget on the number of
observer nodes we can maintain. Moreover, some nodes are more informative than
others due to their location in the network. Hence, a pertinent question
arises: Which nodes should we select as observers in order to maximize the
probability that we can accurately identify the source? Inspired by the simple
setting in which the node-to-node delays in the transmission of the epidemic
are deterministic, we develop a principled approach for addressing the problem
even when transmission delays are random. We show that the optimal
observer-placement differs depending on the variance of the transmission delays
and propose approaches in both low- and high-variance settings. We validate our
methods by comparing them against state-of-the-art observer-placements and show
that, in both settings, our approach identifies the source with higher
accuracy.Comment: Accepted for presentation at the 54th Annual Allerton Conference on
Communication, Control, and Computin
How CSMA/CA With Deferral Affects Performance and Dynamics in Power-Line Communications
Power-line communications (PLC) are becoming a key component in home
networking, because they provide easy and high-throughput connectivity. The
dominant MAC protocol for high data-rate PLC, the IEEE 1901, employs a CSMA/CA
mechanism similar to the backoff process of 802.11. Existing performance
evaluation studies of this protocol assume that the backoff processes of the
stations are independent (the so-called decoupling assumption). However, in
contrast to 802.11, 1901 stations can change their state after sensing the
medium busy, which is regulated by the so-called deferral counter. This
mechanism introduces strong coupling between the stations and, as a result,
makes existing analyses inaccurate. In this paper, we propose a performance
model for 1901, which does not rely on the decoupling assumption. We prove that
our model admits a unique solution for a wide range of configurations and
confirm the accuracy of the model using simulations. Our results show that we
outperform current models based on the decoupling assumption. In addition to
evaluating the performance in steady state, we further study the transient
dynamics of 1901, which is also affected by the deferral counter.Comment: To appear, IEEE/ACM Transactions on Networking 201
Differences Between Hard and Noisy-labeled Samples: An Empirical Study
Extracting noisy or incorrectly labeled samples from a labeled dataset with
hard/difficult samples is an important yet under-explored topic. Two general
and often independent lines of work exist, one focuses on addressing noisy
labels, and another deals with hard samples. However, when both types of data
are present, most existing methods treat them equally, which results in a
decline in the overall performance of the model. In this paper, we first design
various synthetic datasets with custom hardness and noisiness levels for
different samples. Our proposed systematic empirical study enables us to better
understand the similarities and more importantly the differences between
hard-to-learn samples and incorrectly-labeled samples. These controlled
experiments pave the way for the development of methods that distinguish
between hard and noisy samples. Through our study, we introduce a simple yet
effective metric that filters out noisy-labeled samples while keeping the hard
samples. We study various data partitioning methods in the presence of label
noise and observe that filtering out noisy samples from hard samples with this
proposed metric results in the best datasets as evidenced by the high test
accuracy achieved after models are trained on the filtered datasets. We
demonstrate this for both our created synthetic datasets and for datasets with
real-world label noise. Furthermore, our proposed data partitioning method
significantly outperforms other methods when employed within a semi-supervised
learning framework
Connectivity vs Capacity in Dense Ad Hoc Networks
We study the connectivity and capacity of finite area ad hoc wireless networks, with an increasing number of nodes (dense networks). We find that the properties of the network strongly depend on the shape of the attenuation function. For power law attenuation functions, connectivity scales, and the available rate per node is known to decrease like 1/sqrt(n). On the contrary, if the attenuation function does not have a singularity at the origin and is uniformly bounded, we obtain bounds on the percolation domain for large node densities, which show that either the network becomes disconnected, or the available rate per node decreases like 1/n
The Beauty of the Commons: Optimal Load Sharing by Base Station Hopping in Wireless Sensor Networks
In wireless sensor networks (WSNs), the base station (BS) is a critical
sensor node whose failure causes severe data losses. Deploying multiple fixed
BSs improves the robustness, yet requires all BSs to be installed with large
batteries and large energy-harvesting devices due to the high energy
consumption of BSs. In this paper, we propose a scheme to coordinate the
multiple deployed BSs such that the energy supplies required by individual BSs
can be substantially reduced. In this scheme, only one BS is selected to be
active at a time and the other BSs act as regular sensor nodes. We first
present the basic architecture of our system, including how we keep the network
running with only one active BS and how we manage the handover of the role of
the active BS. Then, we propose an algorithm for adaptively selecting the
active BS under the spatial and temporal variations of energy resources. This
algorithm is simple to implement but is also asymptotically optimal under mild
conditions. Finally, by running simulations and real experiments on an outdoor
testbed, we verify that the proposed scheme is energy-efficient, has low
communication overhead and reacts rapidly to network changes
- …